{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<i>Copyright (c) Microsoft Corporation. All rights reserved.</i>\n",
    "\n",
    "<i>Licensed under the MIT License.</i>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Benchmark with Movielens dataset\n",
    "\n",
    "This illustrative comparison applies to collaborative filtering algorithms available in this repository such as Spark ALS, Surprise SVD, SAR and others using the Movielens dataset. These algorithms are usable in a variety of recommendation tasks, including product or news recommendations.\n",
    "\n",
    "The main purpose of this notebook is not to produce comprehensive benchmarking results on multiple datasets. Rather, it is intended to illustrate on how one could evaluate different recommender algorithms using tools in this repository.\n",
    "\n",
    "## Experimentation setup:\n",
    "\n",
    "* Objective\n",
    "  * To compare how each collaborative filtering algorithm perform in predicting ratings and recommending relevant items.\n",
    "\n",
    "* Environment\n",
    "  * The comparison is run on a [Azure Data Science Virtual Machine](https://azure.microsoft.com/en-us/services/virtual-machines/data-science-virtual-machines/). \n",
    "  * The virtual machine size is Standard NC6s_v2 (6 vcpus, 112 GB memory, 1P100 GPU).\n",
    "  * It should be noted that the single node DSVM is not supposed to run scalable benchmarking analysis. Either scaling up or out the computing instances is necessary to run the benchmarking in an run-time efficient way without any memory issue.\n",
    "  * **NOTE ABOUT THE DEPENDENCIES TO INSTALL**: This notebook uses CPU, GPU and PySpark algorithms, so make sure you install the `full environment` as detailed in the [SETUP.md](../SETUP.md). \n",
    "  \n",
    "* Datasets\n",
    "  * [Movielens 100K](https://grouplens.org/datasets/movielens/100k/).\n",
    "  * [Movielens 1M](https://grouplens.org/datasets/movielens/1m/).\n",
    "\n",
    "* Data split\n",
    "  * The data is split into train and test sets.\n",
    "  * The split ratios are 75-25 for train and test datasets.\n",
    "  * The splitting is stratified based on items. \n",
    "\n",
    "* Model training\n",
    "  * A recommendation model is trained by using each of the collaborative filtering algorithms. \n",
    "  * Empirical parameter values reported [here](http://mymedialite.net/examples/datasets.html) are used in this notebook.  More exhaustive hyper parameter tuning would be required to further optimize results.\n",
    "\n",
    "* Evaluation metrics\n",
    "  * Ranking metrics:\n",
    "    * Precision@k.\n",
    "    * Recall@k.\n",
    "    * Normalized discounted cumulative gain@k (NDCG@k).\n",
    "    * Mean-average-precision (MAP). \n",
    "    * In the evaluation metrics above, k = 10. \n",
    "  * Rating metrics:\n",
    "    * Root mean squared error (RMSE).\n",
    "    * Mean average error (MAE).\n",
    "    * R squared.\n",
    "    * Explained variance.\n",
    "  * Run time performance\n",
    "    * Elapsed for training a model and using a model for predicting/recommending k items. \n",
    "    * The time may vary across different machines. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 0 Globals settings"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "System version: 3.6.8 |Anaconda, Inc.| (default, Dec 30 2018, 01:22:34) \n",
      "[GCC 7.3.0]\n",
      "Pandas version: 0.25.1\n",
      "PySpark version: 2.3.1\n",
      "Surprise version: 1.1.0\n",
      "PyTorch version: 1.2.0\n",
      "Fast AI version: 1.0.46\n",
      "Cornac version: 1.2.0\n",
      "Tensorflow version: 1.12.0\n",
      "CUDA version: CUDA Version 10.1.168\n",
      "CuDNN version: 7.5.1\n",
      "Number of cores: 48\n"
     ]
    }
   ],
   "source": [
    "import sys\n",
    "sys.path.append(\"../\")\n",
    "import os\n",
    "import json\n",
    "import pandas as pd\n",
    "import numpy as np\n",
    "import seaborn as sns\n",
    "import pyspark\n",
    "import torch\n",
    "import fastai\n",
    "import tensorflow as tf\n",
    "import surprise\n",
    "\n",
    "from reco_utils.common.general_utils import get_number_processors\n",
    "from reco_utils.common.gpu_utils import get_cuda_version, get_cudnn_version\n",
    "from reco_utils.dataset import movielens\n",
    "from reco_utils.dataset.python_splitters import python_stratified_split\n",
    "\n",
    "from benchmark_utils import * \n",
    "\n",
    "print(\"System version: {}\".format(sys.version))\n",
    "print(\"Pandas version: {}\".format(pd.__version__))\n",
    "print(\"PySpark version: {}\".format(pyspark.__version__))\n",
    "print(\"Surprise version: {}\".format(surprise.__version__))\n",
    "print(\"PyTorch version: {}\".format(torch.__version__))\n",
    "print(\"Fast AI version: {}\".format(fastai.__version__))\n",
    "print(\"Cornac version: {}\".format(cornac.__version__))\n",
    "print(\"Tensorflow version: {}\".format(tf.__version__))\n",
    "print(\"CUDA version: {}\".format(get_cuda_version()))\n",
    "print(\"CuDNN version: {}\".format(get_cudnn_version()))\n",
    "n_cores = get_number_processors()\n",
    "print(\"Number of cores: {}\".format(n_cores))\n",
    "\n",
    "%load_ext autoreload\n",
    "%autoreload 2"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Parameters"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Run parameters\n",
    "EPOCHS = 15"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Hide fastai progress bar\n",
    "hide_fastai_progress_bar()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "# fix random seeds to make sure out runs are reproducible\n",
    "np.random.seed(SEED)\n",
    "torch.manual_seed(SEED)\n",
    "torch.cuda.manual_seed_all(SEED)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "environments = {\n",
    "    \"als\": \"pyspark\",\n",
    "    \"sar\": \"python_cpu\",\n",
    "    \"svd\": \"python_cpu\",\n",
    "    \"fastai\": \"python_gpu\",\n",
    "    \"ncf\": \"python_gpu\",\n",
    "    \"bpr\": \"python_cpu\",\n",
    "}\n",
    "\n",
    "metrics = {\n",
    "    \"als\": [\"rating\", \"ranking\"],\n",
    "    \"sar\": [\"ranking\"],\n",
    "    \"svd\": [\"rating\", \"ranking\"],\n",
    "    \"fastai\": [\"rating\", \"ranking\"],\n",
    "    \"ncf\": [\"ranking\"],\n",
    "    \"bpr\": [\"ranking\"],\n",
    "}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Algorithm parameters"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "\n",
    "als_params = {\n",
    "    \"rank\": 10,\n",
    "    \"maxIter\": EPOCHS,\n",
    "    \"implicitPrefs\": False,\n",
    "    \"alpha\": 0.1,\n",
    "    \"regParam\": 0.05,\n",
    "    \"coldStartStrategy\": \"drop\",\n",
    "    \"nonnegative\": False,\n",
    "    \"userCol\": DEFAULT_USER_COL,\n",
    "    \"itemCol\": DEFAULT_ITEM_COL,\n",
    "    \"ratingCol\": DEFAULT_RATING_COL,\n",
    "}\n",
    "\n",
    "sar_params = {\n",
    "    \"similarity_type\": \"jaccard\",\n",
    "    \"time_decay_coefficient\": 30,\n",
    "    \"time_now\": None,\n",
    "    \"timedecay_formula\": True,\n",
    "    \"col_user\": DEFAULT_USER_COL,\n",
    "    \"col_item\": DEFAULT_ITEM_COL,\n",
    "    \"col_rating\": DEFAULT_RATING_COL,\n",
    "    \"col_timestamp\": DEFAULT_TIMESTAMP_COL,\n",
    "}\n",
    "\n",
    "svd_params = {\n",
    "    \"n_factors\": 150,\n",
    "    \"n_epochs\": EPOCHS,\n",
    "    \"lr_all\": 0.005,\n",
    "    \"reg_all\": 0.02,\n",
    "    \"random_state\": SEED,\n",
    "    \"verbose\": False\n",
    "}\n",
    "\n",
    "fastai_params = {\n",
    "    \"n_factors\": 40, \n",
    "    \"y_range\": [0,5.5], \n",
    "    \"wd\": 1e-1,\n",
    "    \"max_lr\": 5e-3,\n",
    "    \"epochs\": EPOCHS\n",
    "}\n",
    "\n",
    "ncf_params = {\n",
    "    \"model_type\": \"NeuMF\",\n",
    "    \"n_factors\": 4,\n",
    "    \"layer_sizes\": [16, 8, 4],\n",
    "    \"n_epochs\": EPOCHS,\n",
    "    \"batch_size\": 1024,\n",
    "    \"learning_rate\": 1e-3,\n",
    "    \"verbose\": 10\n",
    "}\n",
    "\n",
    "bpr_params = {\n",
    "    \"k\": 200,\n",
    "    \"max_iter\": EPOCHS,\n",
    "    \"learning_rate\": 0.075,\n",
    "    \"lambda_reg\": 1e-3,\n",
    "    \"seed\": SEED,\n",
    "    \"verbose\": False\n",
    "}\n",
    "\n",
    "params = {\n",
    "    \"als\": als_params,\n",
    "    \"sar\": sar_params,\n",
    "    \"svd\": svd_params,\n",
    "    \"fastai\": fastai_params,\n",
    "    \"ncf\": ncf_params,\n",
    "    \"bpr\": bpr_params,\n",
    "}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "prepare_training_data = {\n",
    "    \"als\": prepare_training_als,\n",
    "    \"svd\": prepare_training_svd,\n",
    "    \"fastai\": prepare_training_fastai,\n",
    "    \"ncf\": prepare_training_ncf,\n",
    "    \"bpr\": prepare_training_bpr,\n",
    "}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [],
   "source": [
    "prepare_metrics_data = {\n",
    "    \"als\": lambda train, test: prepare_metrics_als(train, test),\n",
    "    \"fastai\": lambda train, test: prepare_metrics_fastai(train, test),    \n",
    "}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [],
   "source": [
    "trainer = {\n",
    "    \"als\": lambda params, data: train_als(params, data),\n",
    "    \"svd\": lambda params, data: train_svd(params, data),\n",
    "    \"sar\": lambda params, data: train_sar(params, data), \n",
    "    \"fastai\": lambda params, data: train_fastai(params, data),\n",
    "    \"ncf\": lambda params, data: train_ncf(params, data),\n",
    "    \"bpr\": lambda params, data: train_bpr(params, data),\n",
    "}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [],
   "source": [
    "rating_predictor = {\n",
    "    \"als\": lambda model, test: predict_als(model, test),\n",
    "    \"svd\": lambda model, test: predict_svd(model, test),\n",
    "    \"fastai\": lambda model, test: predict_fastai(model, test),\n",
    "}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [],
   "source": [
    "ranking_predictor = {\n",
    "    \"als\": lambda model, test, train: recommend_k_als(model, test, train),\n",
    "    \"sar\": lambda model, test, train: recommend_k_sar(model, test, train),\n",
    "    \"svd\": lambda model, test, train: recommend_k_svd(model, test, train),\n",
    "    \"fastai\": lambda model, test, train: recommend_k_fastai(model, test, train),\n",
    "    \"ncf\": lambda model, test, train: recommend_k_ncf(model, test, train),\n",
    "    \"bpr\": lambda model, test, train: recommend_k_bpr(model, test, train),\n",
    "}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [],
   "source": [
    "rating_evaluator = {\n",
    "    \"als\": lambda test, predictions: rating_metrics_pyspark(test, predictions),\n",
    "    \"svd\": lambda test, predictions: rating_metrics_python(test, predictions),\n",
    "    \"fastai\": lambda test, predictions: rating_metrics_python(test, predictions)\n",
    "}\n",
    "    \n",
    "    \n",
    "ranking_evaluator = {\n",
    "    \"als\": lambda test, predictions, k: ranking_metrics_pyspark(test, predictions, k),\n",
    "    \"sar\": lambda test, predictions, k: ranking_metrics_python(test, predictions, k),\n",
    "    \"svd\": lambda test, predictions, k: ranking_metrics_python(test, predictions, k),\n",
    "    \"fastai\": lambda test, predictions, k: ranking_metrics_python(test, predictions, k),\n",
    "    \"ncf\": lambda test, predictions, k: ranking_metrics_python(test, predictions, k),\n",
    "    \"bpr\": lambda test, predictions, k: ranking_metrics_python(test, predictions, k),\n",
    "}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [],
   "source": [
    "def generate_summary(data, algo, k, train_time, time_rating, rating_metrics, time_ranking, ranking_metrics):\n",
    "    summary = {\"Data\": data, \"Algo\": algo, \"K\": k, \"Train time (s)\": train_time, \"Predicting time (s)\": time_rating, \"Recommending time (s)\": time_ranking}\n",
    "    if rating_metrics is None:\n",
    "        rating_metrics = {\n",
    "            \"RMSE\": np.nan,\n",
    "            \"MAE\": np.nan,\n",
    "            \"R2\": np.nan,\n",
    "            \"Explained Variance\": np.nan,\n",
    "        }\n",
    "    if ranking_metrics is None:\n",
    "        ranking_metrics = {\n",
    "            \"MAP\": np.nan,\n",
    "            \"nDCG@k\": np.nan,\n",
    "            \"Precision@k\": np.nan,\n",
    "            \"Recall@k\": np.nan,\n",
    "        }\n",
    "    summary.update(rating_metrics)\n",
    "    summary.update(ranking_metrics)\n",
    "    return summary"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Benchmark loop"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [],
   "source": [
    "data_sizes = [\"100k\", \"1m\"] # Movielens data size: 100k, 1m, 10m, or 20m\n",
    "algorithms = [\"als\", \"svd\", \"sar\", \"ncf\", \"fastai\", \"bpr\"]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "100%|██████████| 4.81k/4.81k [00:01<00:00, 2.79kKB/s]\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Size of Movielens 100k: (100000, 4)\n",
      "\n",
      "Computing als algorithm on Movielens 100k\n",
      "\n",
      "Computing svd algorithm on Movielens 100k\n",
      "\n",
      "Computing sar algorithm on Movielens 100k\n",
      "\n",
      "Computing ncf algorithm on Movielens 100k\n",
      "\n",
      "Computing fastai algorithm on Movielens 100k\n",
      "\n",
      "Computing bpr algorithm on Movielens 100k\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "100%|██████████| 5.78k/5.78k [00:01<00:00, 3.10kKB/s]\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Size of Movielens 1m: (1000209, 4)\n",
      "\n",
      "Computing als algorithm on Movielens 1m\n",
      "\n",
      "Computing svd algorithm on Movielens 1m\n",
      "\n",
      "Computing sar algorithm on Movielens 1m\n",
      "\n",
      "Computing ncf algorithm on Movielens 1m\n",
      "\n",
      "Computing fastai algorithm on Movielens 1m\n",
      "\n",
      "Computing bpr algorithm on Movielens 1m\n",
      "CPU times: user 48min 7s, sys: 4min 42s, total: 52min 49s\n",
      "Wall time: 50min 1s\n"
     ]
    }
   ],
   "source": [
    "%%time\n",
    "\n",
    "# For each data size and each algorithm, a recommender is evaluated. \n",
    "cols = [\"Data\", \"Algo\", \"K\", \"Train time (s)\", \"Predicting time (s)\", \"RMSE\", \"MAE\", \"R2\", \"Explained Variance\", \"Recommending time (s)\", \"MAP\", \"nDCG@k\", \"Precision@k\", \"Recall@k\"]\n",
    "df_results = pd.DataFrame(columns=cols)\n",
    "\n",
    "for data_size in data_sizes:\n",
    "    # Load the dataset\n",
    "    df = movielens.load_pandas_df(\n",
    "        size=data_size,\n",
    "        header=[DEFAULT_USER_COL, DEFAULT_ITEM_COL, DEFAULT_RATING_COL, DEFAULT_TIMESTAMP_COL]\n",
    "    )\n",
    "    print(\"Size of Movielens {}: {}\".format(data_size, df.shape))\n",
    "    \n",
    "    # Split the dataset\n",
    "    df_train, df_test = python_stratified_split(df,\n",
    "                                                ratio=0.75, \n",
    "                                                min_rating=1, \n",
    "                                                filter_by=\"item\", \n",
    "                                                col_user=DEFAULT_USER_COL, \n",
    "                                                col_item=DEFAULT_ITEM_COL\n",
    "                                                )\n",
    "   \n",
    "    # Loop through the algos\n",
    "    for algo in algorithms:\n",
    "        print(\"\\nComputing {} algorithm on Movielens {}\".format(algo, data_size))\n",
    "          \n",
    "        # Data prep for training set\n",
    "        train = prepare_training_data.get(algo, lambda x:x)(df_train)\n",
    "        \n",
    "        # Get model parameters\n",
    "        model_params = params[algo]\n",
    "          \n",
    "        # Train the model\n",
    "        model, time_train = trainer[algo](model_params, train)\n",
    "                \n",
    "        # Predict and evaluate\n",
    "        train, test = prepare_metrics_data.get(algo, lambda x,y:(x,y))(df_train, df_test)\n",
    "        \n",
    "        if \"rating\" in metrics[algo]:   \n",
    "            # Predict for rating\n",
    "            preds, time_rating = rating_predictor[algo](model, test)\n",
    "                        \n",
    "            # Evaluate for rating\n",
    "            ratings = rating_evaluator[algo](test, preds)\n",
    "        else:\n",
    "            ratings = None\n",
    "            time_rating = np.nan\n",
    "        \n",
    "        if \"ranking\" in metrics[algo]:\n",
    "            # Predict for ranking\n",
    "            top_k_scores, time_ranking = ranking_predictor[algo](model, test, train)\n",
    "            \n",
    "            # Evaluate for rating\n",
    "            rankings = ranking_evaluator[algo](test, top_k_scores, DEFAULT_K)\n",
    "        else:\n",
    "            rankings = None\n",
    "            time_ranking = np.nan\n",
    "            \n",
    "        # Record results\n",
    "        summary = generate_summary(data_size, algo, DEFAULT_K, time_train, time_rating, ratings, time_ranking, rankings)\n",
    "        df_results.loc[df_results.shape[0] + 1] = summary"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Results"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Data</th>\n",
       "      <th>Algo</th>\n",
       "      <th>K</th>\n",
       "      <th>Train time (s)</th>\n",
       "      <th>Predicting time (s)</th>\n",
       "      <th>RMSE</th>\n",
       "      <th>MAE</th>\n",
       "      <th>R2</th>\n",
       "      <th>Explained Variance</th>\n",
       "      <th>Recommending time (s)</th>\n",
       "      <th>MAP</th>\n",
       "      <th>nDCG@k</th>\n",
       "      <th>Precision@k</th>\n",
       "      <th>Recall@k</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <td>1</td>\n",
       "      <td>100k</td>\n",
       "      <td>als</td>\n",
       "      <td>10</td>\n",
       "      <td>5.2630</td>\n",
       "      <td>0.0637</td>\n",
       "      <td>0.970278</td>\n",
       "      <td>0.756116</td>\n",
       "      <td>0.247909</td>\n",
       "      <td>0.243499</td>\n",
       "      <td>0.0937</td>\n",
       "      <td>0.004784</td>\n",
       "      <td>0.045017</td>\n",
       "      <td>0.047508</td>\n",
       "      <td>0.015420</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>2</td>\n",
       "      <td>100k</td>\n",
       "      <td>svd</td>\n",
       "      <td>10</td>\n",
       "      <td>4.2469</td>\n",
       "      <td>0.6350</td>\n",
       "      <td>0.938681</td>\n",
       "      <td>0.742690</td>\n",
       "      <td>0.291967</td>\n",
       "      <td>0.291971</td>\n",
       "      <td>17.1032</td>\n",
       "      <td>0.012873</td>\n",
       "      <td>0.095930</td>\n",
       "      <td>0.091198</td>\n",
       "      <td>0.032783</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>3</td>\n",
       "      <td>100k</td>\n",
       "      <td>sar</td>\n",
       "      <td>10</td>\n",
       "      <td>0.2794</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>0.1062</td>\n",
       "      <td>0.113028</td>\n",
       "      <td>0.388321</td>\n",
       "      <td>0.333828</td>\n",
       "      <td>0.183179</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>4</td>\n",
       "      <td>100k</td>\n",
       "      <td>ncf</td>\n",
       "      <td>10</td>\n",
       "      <td>70.0113</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>4.2467</td>\n",
       "      <td>0.102130</td>\n",
       "      <td>0.379410</td>\n",
       "      <td>0.331707</td>\n",
       "      <td>0.173080</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>5</td>\n",
       "      <td>100k</td>\n",
       "      <td>fastai</td>\n",
       "      <td>10</td>\n",
       "      <td>94.1108</td>\n",
       "      <td>0.0459</td>\n",
       "      <td>0.943084</td>\n",
       "      <td>0.744337</td>\n",
       "      <td>0.285308</td>\n",
       "      <td>0.287671</td>\n",
       "      <td>3.3439</td>\n",
       "      <td>0.025503</td>\n",
       "      <td>0.147866</td>\n",
       "      <td>0.130329</td>\n",
       "      <td>0.053824</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>6</td>\n",
       "      <td>100k</td>\n",
       "      <td>bpr</td>\n",
       "      <td>10</td>\n",
       "      <td>0.6386</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1.1675</td>\n",
       "      <td>0.105365</td>\n",
       "      <td>0.389948</td>\n",
       "      <td>0.349841</td>\n",
       "      <td>0.181807</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>7</td>\n",
       "      <td>1m</td>\n",
       "      <td>als</td>\n",
       "      <td>10</td>\n",
       "      <td>2.9332</td>\n",
       "      <td>0.0222</td>\n",
       "      <td>0.860514</td>\n",
       "      <td>0.679387</td>\n",
       "      <td>0.412306</td>\n",
       "      <td>0.406364</td>\n",
       "      <td>0.0778</td>\n",
       "      <td>0.002144</td>\n",
       "      <td>0.025975</td>\n",
       "      <td>0.032284</td>\n",
       "      <td>0.010219</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>8</td>\n",
       "      <td>1m</td>\n",
       "      <td>svd</td>\n",
       "      <td>10</td>\n",
       "      <td>42.5560</td>\n",
       "      <td>3.5602</td>\n",
       "      <td>0.883017</td>\n",
       "      <td>0.695366</td>\n",
       "      <td>0.374910</td>\n",
       "      <td>0.374911</td>\n",
       "      <td>251.1241</td>\n",
       "      <td>0.008828</td>\n",
       "      <td>0.089320</td>\n",
       "      <td>0.082856</td>\n",
       "      <td>0.021582</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>9</td>\n",
       "      <td>1m</td>\n",
       "      <td>sar</td>\n",
       "      <td>10</td>\n",
       "      <td>2.3723</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>2.1282</td>\n",
       "      <td>0.066214</td>\n",
       "      <td>0.313502</td>\n",
       "      <td>0.279692</td>\n",
       "      <td>0.111135</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>10</td>\n",
       "      <td>1m</td>\n",
       "      <td>ncf</td>\n",
       "      <td>10</td>\n",
       "      <td>868.2188</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>46.7333</td>\n",
       "      <td>0.063082</td>\n",
       "      <td>0.350428</td>\n",
       "      <td>0.322213</td>\n",
       "      <td>0.109614</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>11</td>\n",
       "      <td>1m</td>\n",
       "      <td>fastai</td>\n",
       "      <td>10</td>\n",
       "      <td>745.9277</td>\n",
       "      <td>0.4392</td>\n",
       "      <td>0.874488</td>\n",
       "      <td>0.695508</td>\n",
       "      <td>0.386927</td>\n",
       "      <td>0.389560</td>\n",
       "      <td>57.9075</td>\n",
       "      <td>0.026265</td>\n",
       "      <td>0.184667</td>\n",
       "      <td>0.168561</td>\n",
       "      <td>0.055841</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>12</td>\n",
       "      <td>1m</td>\n",
       "      <td>bpr</td>\n",
       "      <td>10</td>\n",
       "      <td>5.2941</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>20.4489</td>\n",
       "      <td>0.067077</td>\n",
       "      <td>0.353277</td>\n",
       "      <td>0.324449</td>\n",
       "      <td>0.118385</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    Data    Algo   K Train time (s) Predicting time (s)      RMSE       MAE  \\\n",
       "1   100k     als  10         5.2630              0.0637  0.970278  0.756116   \n",
       "2   100k     svd  10         4.2469              0.6350  0.938681  0.742690   \n",
       "3   100k     sar  10         0.2794                 NaN       NaN       NaN   \n",
       "4   100k     ncf  10        70.0113                 NaN       NaN       NaN   \n",
       "5   100k  fastai  10        94.1108              0.0459  0.943084  0.744337   \n",
       "6   100k     bpr  10         0.6386                 NaN       NaN       NaN   \n",
       "7     1m     als  10         2.9332              0.0222  0.860514  0.679387   \n",
       "8     1m     svd  10        42.5560              3.5602  0.883017  0.695366   \n",
       "9     1m     sar  10         2.3723                 NaN       NaN       NaN   \n",
       "10    1m     ncf  10       868.2188                 NaN       NaN       NaN   \n",
       "11    1m  fastai  10       745.9277              0.4392  0.874488  0.695508   \n",
       "12    1m     bpr  10         5.2941                 NaN       NaN       NaN   \n",
       "\n",
       "          R2  Explained Variance Recommending time (s)       MAP    nDCG@k  \\\n",
       "1   0.247909            0.243499                0.0937  0.004784  0.045017   \n",
       "2   0.291967            0.291971               17.1032  0.012873  0.095930   \n",
       "3        NaN                 NaN                0.1062  0.113028  0.388321   \n",
       "4        NaN                 NaN                4.2467  0.102130  0.379410   \n",
       "5   0.285308            0.287671                3.3439  0.025503  0.147866   \n",
       "6        NaN                 NaN                1.1675  0.105365  0.389948   \n",
       "7   0.412306            0.406364                0.0778  0.002144  0.025975   \n",
       "8   0.374910            0.374911              251.1241  0.008828  0.089320   \n",
       "9        NaN                 NaN                2.1282  0.066214  0.313502   \n",
       "10       NaN                 NaN               46.7333  0.063082  0.350428   \n",
       "11  0.386927            0.389560               57.9075  0.026265  0.184667   \n",
       "12       NaN                 NaN               20.4489  0.067077  0.353277   \n",
       "\n",
       "    Precision@k  Recall@k  \n",
       "1      0.047508  0.015420  \n",
       "2      0.091198  0.032783  \n",
       "3      0.333828  0.183179  \n",
       "4      0.331707  0.173080  \n",
       "5      0.130329  0.053824  \n",
       "6      0.349841  0.181807  \n",
       "7      0.032284  0.010219  \n",
       "8      0.082856  0.021582  \n",
       "9      0.279692  0.111135  \n",
       "10     0.322213  0.109614  \n",
       "11     0.168561  0.055841  \n",
       "12     0.324449  0.118385  "
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_results"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python (reco_full)",
   "language": "python",
   "name": "reco_full"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.8"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
